智能论文笔记

YOLO-FaceV2: A Scale and Occlusion Aware Face Detector

Ziping Yu , Hongbo Huang , Weijun Chen , Yongxin Su , Yahui Liu , Xiuying Wang

分类：计算机视觉

2022-08-03

近年来，基于深度学习的面部检测算法取得了长足的进步。这些算法通常可以分为两类，即诸如更快的R-CNN和像Yolo这样的单阶段检测器之类的两个阶段检测器。由于准确性和速度之间的平衡更好，因此在许多应用中广泛使用了一阶段探测器。在本文中，我们提出了一个基于一阶段检测器Yolov5的实时面部检测器，名为Yolo-Facev2。我们设计一个称为RFE的接收场增强模块，以增强小面的接受场，并使用NWD损失来弥补IOU对微小物体的位置偏差的敏感性。对于面部阻塞，我们提出了一个名为Seam的注意模块，并引入了排斥损失以解决它。此外，我们使用重量函数幻灯片来解决简单和硬样品之间的不平衡，并使用有效的接收场的信息来设计锚。宽面数据集上的实验结果表明，在所有简单，中和硬子集中都可以找到我们的面部检测器及其变体的表现及其变体。源代码https://github.com/krasjet-yu/yolo-facev2

translated by 谷歌翻译

HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Shen Gao , Yuchi Zhang , Yongliang Wang , Yang Dong , Xiuying Chen , Dongyan Zhao , Rui Yan

分类：自然语言处理 | 人工智能

2021-12-27

社区问题应答（CQA）是一个明确的任务，可以在许多方案中使用，例如电子商务和在线用户社区以进行特殊兴趣。在这些社区中，用户可以发布文章，发表评论，提出一个问题并回答它。这些数据形成异构信息来源，其中每个信息源都有自己的特殊结构和背景（附加到文章或相关问题附加的评论）。大多数CQA方法仅包含文章或维基百科，以提取知识并回答用户的问题。然而，这些CQA方法并未完全探索社区中的各种信息源，并且这些多个信息源（MIS）可以向用户的问题提供更多相关知识。因此，我们提出了一个问题感知异构图形变换器，以将MIS纳入用户社区中的MIS，以自动生成答案。为了评估我们所提出的方法，我们在两个数据集中进行实验：$ \ text {msm} ^ {\ text {msm}} $ the benchmark dataset ms-marco和Antqa数据集的修改版本，它是第一个大规模CQA数据集有四种类型的错误。在两个数据集上进行广泛的实验表明，我们的模型在所有指标方面都优越所有基线。

translated by 谷歌翻译

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Mingzhe Li , Xiuying Chen , Weiheng Liao , Yang Song , Tao Zhang , Dongyan Zhao , Rui Yan

分类：自然语言处理

2023-01-03

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

translated by 谷歌翻译

Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order

Xiuying Chen , Mingzhe Li , Shen Gao , Zhangming Chan , Dongyan Zhao , Xin Gao , Xiangliang Zhang , Rui Yan

分类：自然语言处理

2023-01-02

Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.

translated by 谷歌翻译

Scientific Paper Extractive Summarization Enhanced by Citation Graphs

Xiuying Chen , Mingzhe Li , Shen Gao , Rui Yan , Xin Gao , Xiangliang Zhang

分类：自然语言处理

2022-12-08

In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.

translated by 谷歌翻译

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Xiuying Wei , Yunchen Zhang , Xiangguo Zhang , Ruihao Gong , Shanghang Zhang , Qi Zhang , Fengwei Yu , Xianglong Liu

分类：机器学习

2022-09-27

变压器架构已成为广泛的自然语言处理〜（NLP）模型的基本要素。随着大型NLP模型的趋势，增加的内存和计算成本阻碍了其在资源有限设备上的有效部署。因此，变压器量化吸引了广泛的研究兴趣。最近的工作认识到结构化的离群值是量化性能的关键瓶颈。但是，他们提出的方法增加了开销的计算，仍然将异常值留在那里。为了从根本上解决这个问题，本文深入研究了异常值的固有诱因和重要性。我们发现$ \ boldsymbol \ gamma $ in LaiserNorm（ln）充当异常值的有罪放大器，而异常值的重要性差异很大，其中一些代币提供的一些异常值覆盖了大面积，但可以牢固地夹住一个大面积，但可以将其夹住，而没有负面影响。。在这些发现的激励下，我们提出了一个异常抑制框架，其中包括两个组成部分：伽玛迁移和象征性的剪辑。伽马迁移将异常放大器迁移到等效转换中的后续模块，从而导致更量化的模型而没有任何额外的负担。令牌的剪辑利用了令牌范围的较大差异，并设计了代币的粗到精细管道，以有效的方式获得了具有最小的最终量化损失的剪辑范围。该框架有效地抑制了异常值，可以在插件模式下使用。广泛的实验证明，我们的框架超过了现有作品，并且首次将6位训练后的BERT量化量化推向完整精确度（FP）级别。我们的代码可在https://github.com/wimh966/outlier_suppression上找到。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

RELIANT: Fair Knowledge Distillation for Graph Neural Networks

Yushun Dong , Binchi Zhang , Yiling Yuan , Na Zou , Qi Wang , Jundong Li

分类：机器学习

2023-01-03

Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.

translated by 谷歌翻译